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Abstract 

Bayesian statistics in the frame of the maximum entropy concept has widely 

been used for inferential problems, particularly, to infer dynamic properties 

of strongly correlated fermion systems from Quantum-Monte-Carlo (QMC) 

imaginary time data. In current applications, however, a consistent treatment 

of the error-covariance of the QMC data is missing. Here we present a closed 

Bayesian approach to account consistently for the QMC-data. 
PACS numbers: 71.20.Ad 
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I. INTRODUCTION 



Bayesian statistics provides a general and consistent frame for logical inference based 
on incomplete and noisy data and prior knowledge. Combined with the entropic Prior it is 
referred to as quantified maximum entropy (MaxEnt ) and yields the most probable 
and unbiased result, given the data-constraints and prior knowledge. MaxEnt has originally 
been introduced to infer celestial images from incomplete and noisy radio-astronomic 
data. In the sequel, it has been applied successfully to various other data-analysis problems 

Mi- 

Here we will focus on the ill-posed inversion problem encountered in Quantum-Monte- 
Carlo (QMC) simulations PJ^. In this field, MaxEnt has become a standard and successful 
technique to infer dynamic properties of strongly correlated fermion systems from imaginary 
time QMC data, which intrinsically is an inverse Laplace transform. QMC yields values 
for dynamic quantities along the imaginary time-axis for a finite number of times. The 
inversion is not unique due to the limited number of data and the presence of statistical 
errors. A direct inversion of the Laplace transform would tremendously overfit the noise and 
the desired signal would be buried underneath it. Bayesian probability theory provides a 
consistent frame to separate the signal from the noise. 

A complication arises, however, in the present application as the errors of the QMC-data 
are correlated. It has been proposed |^ to include the error-covariance matrix as exact data- 
constraints. This approach has been heavily debated as it leads to the following dilemma. 
For a standard QMC sample size, the off-diagonal elements of the covariance matrix are not 
negligible and ought to be taken into account. However, the errors of the covariance matrix 
are huge and the information provided by the QMC covariance matrix is useless and in many 
applications even disadvantageous. The error of the covariance matrix can be decreased by 
increasing the sample size. This procedure is, however, computationally very expensive and, 
moreover, needless as the correlation of the errors decreases at the same time. The dilemma 
obviously arises due to the neglect of the errors of the covariance matrix. Here we present a 
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closed Bayesian approach to account fully for the noisy QMC data plus covariance matrix. 
This scheme allows to infer reliable results with the least amount of computer time. 

Bayes' theorem aims at determining the posterior probability {posterior ) P{A\DCH) of 
the sought-for quantity A, given hypotheses H and QMC data D, which are related to A 



via 



Dr = Dr{{A}) . (1.1) 

The experimental data Di deviate from the exact values by the statistical QMC errors 
rji. Due to the QMC- algorithm, the errors are correlated and the information about the 
error-covariance matrix will be denoted by C. Bayes' theorem relates the posterior to the 
likelihood function P{D\ACH), which contains the error statistic of the QMC data, and the 
prior probability P{A\CH) 

. PiD\ACH)PiA\CH, 

^ ' ' P{D\CH) ^ ' 

The most honest Prior should summarize all our prior knowledge - the knowledge we have 
about A prior to receiving the experimental data A - and nothing more, i.e. it should 
be as ignorant as possible otherwise [jl|. Our prior knowledge is part of the hypotheses 
H . If nothing is known about the solution A the most ignorant Prior would simply be 
P(y4|Cif ) = const. (If nothing is known about a coin which we are going to toss, common 
sense tells us to assign equal prior probabilities to the events "head (tail) shows up"). 

In the case of a positive, additive distribution function (PAD) - e.g. the spectral density 
- the most ignorant Prior is the entropic Prior \W\ 

PiA\CH) = ^exp(a f[A{u)-m{u)-A{uj)ln{^^)]duj) . (1.3) 
Zs ^ J m[uj) ' 

S 

Zs is the normalization constant guarantying / P{^A\CliYDA =1. In the following we will 
assume, as indicated in Eq. |L3| that A = A{uj) is a function of the frequency u. The entropy 
S is measured relative to a default-model m{uj) which contains "weak prior assumptions" 
which can still be overruled by the data constraints. 



Next we will turn to the determination of the Likelihood, the central topic of this paper. 
Usually this part is dealt with in one sentence: "the Likelihood function is given by a 
Gaussian" . Here we will be more specific why and when this is true. The Likelihood quantifies 
the probability for the realization of the specific data- values D measured in the experiment, 
supposing the exact function A were known. Given A, the exact values D'^^ in data-space are 
also known (Eq. The Likelihood describes therefore the error statistics of the QMC-data 

P{D\ACH) =: p{D'''' - D) = p{r]) given C and H . (1.4) 

In the following, C stands for the error- covariance matrix Cij 

= = N~[^^^^^ - - D,)) , (1.5) 

measured by QMC. is the number of data- values Di. 

To begin with, we assume that the exact values of the error- covariance are known. The 
resulting problem is to determine the PAD p{r]) given the constraints 



Ctj = J pivHVjdv (1-6) 
1 = / pir])dr] . (1.7) 



This is a problem falling into the realm of Jaynes' MaxEnt [|ri|], which is analogous 
to deriving the barometric formula. Maxwell's velocity distribution, or Fermi- and Bose 
statistics. In this framework, p{ri) is obtained upon maximizing the entropy, subject to the 
exact data-constraints. Treating the constraints with Lagrange parameters we are out for 
the maximum of 

£ = 5 - ^ A.,(| pivHVjdv - Q,)-\o E( / Piv)dv - 1) , (1.8) 

ij i 

with Ao, Xij being Lagrange parameters. Upon maximizing C with respect to p{ri) one obtains 
an analytic expression for the solution 

p(ry) = le^^"^^'^"'''^ . (1.9) 

Zj 



An ignorant, flat default model (/ m{uj)duj = 1) has been assumed. Z is determined via the 
normalization constraint 

y^det(Aij) 
The covariance constraint implies 

^.. = -^^ = ^(A-% ^A,, = i(C-% . (1.11) 

Hence the Likelihood is the ubiquitous normal distribution 

p(r/) = e'^^'^'''^'^'''^^ (1.12) 

/det(27rC) 



which simplifles to a Gaussian if the errors are uncorrelated Cij = Sijcrf 

p(r/) = ^=L==e"^^4 (1.13) 

Unfortunately, this handy result is only valid if the QMC error- covariance were known 
exactly, which is not the case. Therefore, the errors of the covariance matrix have to be 
treated on the same footing as the errors of the QMC data D in the flrst place - by quantified 
MaxEnt |]. 

Again, the posterior for p{ri), given the QMC error-covariance Cij, and the statistical 
errors aij of Cij and all our hypotheses can be determined via Bayes' theorem 

Superfluous conditions have been discarded. Again the entropic Prior is invoked. QMC 
simulations provide the statistical error of the covariance matrix. Further information 
is not available by present QMC simulations. We therefore assume that the error aij are 
uncorrelated and known, this is part of our hypotheses H. A generalization beyond this 
assumption is straightforward. Along the lines presented above the Likelihood reads 

P{C\paH) = exp(-l ^ ^^^^ " ^'^,'^'-'^'^'^' ) ■ (1.15) 



The MaxEnt result is hence obtained upon maximizing the posterior P{p\C(jH) oc 
exp(aS' — \x^) oi' rather 

C{p,C) = aS-^x' (1-16) 

The solution can be cast into the same functional form (Eq. |1.9| ) as in the case of exact 
data-constraints , merely the determination of the Lagrange parameters is modified due 
to the presence of noise to 



Cij + aal = J p{ri)riirijdri 

. (1.17) 



1 

5' 



This relation agrees with Eq. |1.11| for aij = 0. But in practice the errors of the covariance 

matrix are considerable. Particular if only few QMC data are available the inclusion of 

the errors of the covariance matrix are essential to obtain an unbiased estimator for the 

dynamic quantities: The covariance matrix Cij is determined from QMC data via Cij = 
1 ^ 

— — - ^ AD'^ADj, where represents the independent measurements (bins) of Di, as 
discussed below. It is obvious that the rank of C is less or equal to the number of bins {N) . 
Hence, if the number of bins is less than the dimension of the covariance matrix, the inverse 
of C, entering Eq. p..l2| , does not exist, and the regularization term in |1.17| is essential (no 
matter how small cjjj is) to determine p{r]). 

Hence the Likelihood P{D\ACH) remains a normal distribution, merely the covariance 
is not the QMC error- covariance, it rather has to be determined via Eq. |1.17| . The reg- 
ularization parameter a entering Eq. |1.1?| can be determined either selfconsistently upon 
maximizing the marginal posterior P{a\C(jH) or via the historic condition = 

We employ the historic approach since we know that the number of good degrees of freedom 
is small and both stopping criteria will yield essentially the same result 0]. 
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II. APPLICATION TO THE SPECTRAL PROPERTIES OF STRONGLY 

CORRELATED ELECTRONS 



As a typical and topical problem we study the dynamic properties of the Hubbard model 
which is presently the subject of intense analytical and numerical studies. The detailed 
understanding of the dynamic properties of strongly correlated electrons is essential for the 
theoretical description of the high temperature superconductors. The Hubbard model reads 

H = -t {4,aCj,a + h.c.) +UY^ m^mi (2.1) 

<i,j>,cr i 

with the hopping matrix element t between two adjacent sites, c-^l destroys (creates) an 
electron of spin a on site i, < i,j > denotes nearest neighbors, f/ is a Coulomb repulsion 
for two electrons of opposite spin on the same site and ni^„ = cj ^Ci^^. 

Unfortunately, dynamic properties cannot be measured directly by QMC simulations. 
Dynamical information is provided by Matsubara Greens functions Di = — < TrP{Ti)Q{0) > 
for discrete r/-values on the imaginary time axis, where I = n* (3/L, n = 0, L and L is the 
number of time slices. P and Q are operators which define the correlation function. Here we 
will consider the one-particle properties of strongly correlated fermions, and the operators 
are therefore P = c and Q = c^, respectively. In order to determine the spectral density 
A{u!) for real frequencies u the spectral theorem is applied: 



e 



-TiLU 



D, = - j A{u;)^^-^du (2.2) 

which is, as already stated above, an inverse Laplace transformation problem and patholog- 
ically ill-posed. 

Further information on the spectrum is provided by making use of the lowest order 
moments of A{uo), 

Hm= I uj'^A{uj)duj, (2.3) 



which are given by commutation relations, Hm =< [c, i^]m, >, and are of simple shape for 
m=l,2 |3|. 



III. RESULTS 



In order to compare the QMC/ME-data with exact results, we consider a chain of = 
12 sites, which is still accessible by exact diagonalization (ED) techniques. The QMC- 
simulations were done for an inverse temperature of (3t = 20 (T = O.OSt), where ground 
state behavior is achieved for this system size and a comparison with the T = 0-ED results 
is possible. After obtaining thermal equilibrium, up to 640000 sweeps through the space-time 
Ising-fields were performed. 

Data from consecutive measurements are highly correlated, even for the same statistical 
variable Di. A study of the skewness (third moment) and the kurtosis (fourth moment) of 
the data showed, that to get Gaussian behavior at least 200 measurements, each separated 
by 4 sweeps, have to be accumulated to form one bin. Then the results of a certain number 
of bins are used for the inversion process. 

But binning the data does not suffice to get rid of all the correlations. Still one has 
to consider the correlations in imaginary time r (i.e. between Di and Dii). In Fig. 1 the 
QMC-result for one single bin is compared to the final shape of the Greens function for 
j\jbm _ ggg_ j^stead of being distributed 'at random' around the average the data for this 
bin are systematically lower than the average for t < (3/2 and systematically higher for 
r > 13/2. These correlations may be reduced by forming larger and larger bins (i.e. using 
more and more computation time). But it is the aim of this paper to show that this is not a 



sensible thing to do and that one can do better by taking into account the correlations ||T4 
and particularly by accounting for the statistical errors of the covariance matrix. 

In the following we will discuss the results of the MaxEnt-procedure considering three 
cases: (1) Neglecting any information of the covariance matrix, (2) using the covariance 
matrix only, and (3) taking into account both the covariance and its errors. To determine 
the dependence on the number of bins and to give an quantitative argument for the amount 
of the computational effort, which has to be taken, we show spectra resulting from QMC- 
data for 200, 400 and 800 bins (160000, 320000 and 640000 sweeps, respectively). The 
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number of time-slices, which corresponds to the dimension of the covariance matrix, is 160 
in the present study. 

Starting with 200 bins, which is shghtly larger than the number of time slices, one can see 
(first column in Fig. 2) that neither the MaxEnt reconstruction of the plain data (Fig. 2a) nor 
the additional use of the covariance matrix (Fig. 2d) gives a reliable result. In both spectra 
the structures are too pronounced and at the wrong position. It appears that the results are 
generally better if the covariance matrix is not included, since the additional information is 
treated as exact data-constraints although it suffers from pronounced statistical noise. If, 
however, the statistical errors of the covariance matrix are taken into account (Fig. 2g) the 
result reproduces the ED-result very well. There is a small overestimation of the spectral 
weight at a; ~ —3 only. 

Increasing the number of bins to 400 gives still an overfitted result for the first case (Fig. 
2b). Taking the covariance matrix into account (Fig. 2e) shows a slightly improved spectrum 
for a; > 0, but for u ^ —2 the spectral weight is suppressed completely. Again the best 
spectrum is obtained if the errors of the covariance matrix are properly accounted for (Fig. 
2h). The maximum at a; ~ — 3 is damped to the correct shape and the agreement with the 
ED-result is nearly perfect now. 

Eventually for the large number of 800 bins all three spectra show satisfactory results 
(third column of Fig. 2). Only in the first case (Fig. 2c) the MEl-curve decreases still too 
fast for a; > 5 leading to a wrong width. 

The convergency of the various approaches is reasonable, since with increasing number 
of bins, the correlation of the QMC-errors for different imaginary times vanishes and the 
covariance matrix becomes diagonal and the covariance of the errors can be ignored. At the 
same time, the errors of the covariance matrix decrease and assuming exact data-constraints 
becomes also exact. 

Further investigations of the effect of the error of the covariance matrix revealed that the 
procedure can be simplified by assuming a sufficiently large constant relative error (in our 
case 20%). The results show no significant deviation from the results obtained by taking 
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the correctly generated errors. 



IV. CONCLUSION 

We have shown that the imaginary time data, obtained by standard QMC simulations, 
suffer from strongly correlated statistical errors if only a small sample is used. It appears 
reasonable to include this additional knowledge (covariance matrix). But it has been gener- 
ally observed, the QMC data of the covariance matrix are useless if the sample size is small 
and they are needless if the sample size is large. And it seemed generally advantageous 
to ignore the off-diagonal elements of the covariance matrix altogether. At any rate, this 
approach demands extremely long QMC-runs (more than half a million sweeps was not even 
sufficient in our case) and gets increasingly impossible for larger system sizes (the compu- 
tation time for one sweep scales with ~ A?"^). We have shown that only if the covariance 
matrix and its errors are treated consistently in the Bayesian frame reliable results can be 
obtained regardless of the number bins. 
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FIGURES 

FIG. 1. The data from one bin (points) compared to the final average over 800 bins (hne). 

FIG. 2. Comparison of the MaxEnt-spectra (thick hne) with the ED-result (thin hne) for 
different parameters: first row (a,b,c): without use of the covariance (first case, see text), second 
row (d,e,f): with covariance (second case), third row (g,h,i): with the error of the covariance matrix 
(third case); in dependence of the number of bins: first column (a,d,g): 200 bins, second column 
(b,e,h): 400 bins, third column (c,f,i): 800 bins. 
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Fig. 1 
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